Jia Lei Qian
Recently, Immigration, Refugees, and Citizenship Canada announced an unprecedented fast-entry lottery inviting 27,332 candidates who meet Canadian experience category requirements to apply for permanent residency in Canada. The Express Entry draw dropped down to 75 points, which is the lowest point in history. Generally, economists agree that immigration is a necessary part of achieving economic growth and keeping taxpayer-funded systems stable and balanced. Also, according to Statistics Canada, Asia was Canada's largest source of immigration and immigration. Therefore, this study will focus on this topic to explore the relationship between the number of Asian immigrants and Canada's economic (GDP) growth, and create a misleading data visualization.
Two datasets have been used in this assignment: Immigration to Canada IBM Dataset retrieved from Kaggle and Canada's GDP Data from World Bank Data. Since the immigration dataset only contains immigration information from 1980 to 2013, this analysis will focus on this time range. Meanwhile, the package Plotly has been used to illustrate in this illustration. Plotly Express is an open-source data visualization for both Python and R. It is written in JavaScript to make the graphics have internal interactivity.
The entire design process is divided into the following two parts: Data Exploration and Design Iteration.
This analysis first generates a pie chart of immigration numbers and a time series plot of Canada's total GDP from 1980 to 2013. The first pie chart shows that Asian immigrants make up more than half of Canada's immigrants, while the second time series plot reveals an overall increasing trend of total GDP except a shrink in GDP in 2009 compared to the previous year. The drop is highlighted in red.
# For working with the data
import pandas as pd
import numpy as np
#Data visualization
import plotly.express as px
#read xlsx file
df3= pd.read_excel("Canada.xlsx",sheet_name='Canada by Citizenship',skiprows=range(20),skipfooter=2)
#display first 5 rows of the dataset
df3.head()
#select the necessary columns
Canada_df=df3.copy()
Canada_df.drop(["AREA", "REG", "DEV", "Type","Coverage"], axis = 1, inplace=True)
#rename the column
Canada_df.rename(columns = {'OdName':'Country', 'AreaName':'Continent','RegName':'Region'}, inplace= True)
#Sum up the total number of the immigration based on country
Canada_df["Total number"]= Canada_df.sum(axis=1)
#Draw a pie chart to show the proportion of the number of immigrants in Canada grouped by different continent
fig = px.pie(Canada_df, values='Total number', names='Continent', color_discrete_sequence=px.colors.sequential.RdBu,
title="Total number of immigrants in Canada from different continents (1980-2013)")
fig.show()
#Resource: World Bank Data: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?locations=CA
Canada_gdp= pd.read_csv("Canadagdp.csv")
#Select the required columns
Canada_gdp=Canada_gdp.drop(["Continent","Region","DevName"],axis=1)
#Restructure the dataset
Canada_gdp2 = pd.melt(Canada_gdp,
id_vars=['Country'],
var_name='year',value_name='Total GDP')
#Change the type of the "year" column
Canada_gdp2['year'] = Canada_gdp2['year'].astype(float)
#Show the head of the dataset
Canada_gdp2.head()
fig = px.line(Canada_gdp2, x="year", y="Total GDP",title = "Trend of Canada Total GDP from 1980 to 2013")
fig.add_vrect(x0=2008,x1=2009, fillcolor="salmon", opacity=0.5,layer="below", line_width=0,)
fig.show()
To further explore the relationship between immigration and economic growth, I decided to use a bubble chart with Plotly Express to create the first iteration. A bubble graph is a variation of a scatter plot that can visualize three different measures simultaneously.
Bubble diagrams are usually used to show and compare the relationship and distribution between data. The correlation between data dimensions is analyzed by comparing the position and size of bubbles. It is easy to read and to make relative comparisons. However, it is also a controversial graphic which could mislead the readers. Bubble charts are difficult to compare numerical variables. It takes time for the reader to interpret all of the various sections of the graph. Therefore, it fits this assignment's theme of creating a deceptive chart and can make a misleading data story to depict an incorrect immigration impact on the Canadian economy. The entire design process involves four iterations.
In the first iteration, I drew a dynamic bubble chart of the total number of Asian immigrants grouped by country classification. It shows The total number of immigrants from different Asian countries to Canada from 1980 to 2013. The bubble's size represents how much it has eclipsed the number of migrants, while two colors differentiate between developing and developing countries. According to the graph, it is obvious to conclude that as of 2013, India had the most significant number of immigrants to Canada, followed by China and the Philippines. In contrast, the number of Japanese immigrants to Canada as a developed country is not much.
Canada_df1=Canada_df.copy()
#rename the column
Canada_df1.rename(columns = {'Total number':'Total number of asian immigrants'}, inplace= True)
#Plot a bubble chart
fig = px.scatter(Canada_df1[Canada_df1.Continent == 'Asia'], x='Total number of asian immigrants', y='Country',
size='Total number of asian immigrants', color='DevName',
title = "Total number of Asian immigrants grouped by country classification from 1980 to 2013")
fig.show()
Since the first draft does not show the difference every year and there is only one developed country in Asia, the second version adds an animated timeline. It groups the data by different Asian regions. By dividing the plot into four subplots based on country classification and setting "animation_frame=’year'" and "facet_col=’Region'", one can see how the bubble chart evolved. By clicking the start button, the movement of the bubbles indicates the movement trend of immigrants from this country to Canada. It is interesting to see that the most considerable fluctuations in the number of migrants have occurred in Southern and Eastern Asia and the least significant in the Middle East over the period.
#Limit the region into Asia
Canada_df_reg = Canada_df.query("Continent in ['Asia']")
#Decide to melt the dataset
Canada_df2 = pd.melt(Canada_df_reg,
id_vars=['Country','Continent' ,'Region', 'DevName','Total number'],
var_name='year',value_name='Immigration number')
#Plot a bubble chart with animated timeline
fig = px.scatter(
Canada_df2,
x="Immigration number",
y="Country",
animation_frame="year",
animation_group="Country",
hover_name="Region",
facet_col="Region",
title="Changing number of Asian immigrants grouped by country region from 1980 to 2013"
)
fig.update_layout(autosize=False,height=500,width=1000,
font=dict(size=10))
fig.show()
The third version combines the immigration and GDP dataset to mislead the viewers. It changes the size and color of bubbles. Specifically, the larger the bubbles' size represents the higher total GDP. On the other hand, the lighter the bubbles' color means, the more immigrants from that country in the corresponding year. It is clear to find that with the increase of Asian immigrants from different countries, the overall economy shows a trend of growth, which corresponds to the previous time series plot.
#Merge the dataset with the GDP data
Canada_df3=Canada_df2.merge(Canada_gdp2,on="year",how="left")
#Drop irrelevant column
Canada_df3=Canada_df3.drop(["Country_y"],axis=1)
#Rename the column
Canada_df3.rename(columns = {'Country_x':'Country'}, inplace= True)
#Show the head of the dataframe
Canada_df3.head()
# Plor a misleading bubble chart
fig = px.scatter(
Canada_df3,
x="Immigration number",
y="Country",
animation_frame="year",
animation_group="Country",
size="Total GDP",
color="Immigration number",
facet_col="Region",
title="Changing number of Asian immigrants sizing by Total GDP in different regoins (1980-2013)"
)
fig.update_layout(autosize=False,height=500,width=1100,
font=dict(size=10))
fig.show()
It is worth noticing that there exist four misleading points:</br>
Lastly, to make the misleading scientific graphic more convincing and seems to make sense, the first three misleading points listed in iteration 3 have been corrected.
To be specific, the final version removes the grouping option and adjusts the graphic size so that readers can view the entire scale of the visualization. Also, the name of the x-axis has been changed to "Asian Countries sizing by total GDP," which helps readers better understand the meaning of the bubble's size. Furthermore, the scale of the color has been changed to be diverging, which allows the readers to know how to interpret the data intuitively. Notice that the final version remains the spurious positive relationship between Asian immigration and the total GDP in a more compelling way.
Canada_gdp3=Canada_gdp2.copy()
# Drop irrelevant column
Canada_gdp3=Canada_gdp3.drop(["Country"],axis=1)
# Merge the gdp dataset with the immigration dataset
Canada_df4=Canada_df2.merge(Canada_gdp3,on="year",how="left")
# Rename the column to help readers better understand the meaning of bubbles' size
Canada_df4.rename(columns = {'Country':'Asian Countries sizing by total GDP'}, inplace= True)
Canada_df4.head()
# Changing the color and size of the bubbles
fig = px.scatter(
Canada_df4,
x="Immigration number",
y="Asian Countries sizing by total GDP",
animation_frame="year",
animation_group="Asian Countries sizing by total GDP",
size="Total GDP",
color="Immigration number",
#the scale of the color scales has been changed to be diverging
color_continuous_scale=px.colors.diverging.Temps,
title="Changing number of Asian immigrants sizing by total GDP from 1980 to 2013"
)
fig.update_layout(autosize=False,height=1000,width=1100,
font=dict(size=10),xaxis=dict(range=[0,42584]))
fig.show()
In a nutshell, in this assignment, a deceptive bubble graph is created to depict Asian immigration impacts on economics. The diagram is based on conventional wisdom among economists that new immigrants can boost a country’s economy. However, it manipulates two datasets without considering confounding variables and misleads the reader to understand the relationship between immigration and economic growth through changing the bubbles’ size. It is possible that digital media without statistics backgrounds, such as business magazines, falsely use the diagram to explain why Canada’s federal government plans to welcome more newcomers in the upcoming years.